Goto

Collaborating Authors

 unsupervised motion representation learning


Unsupervised Motion Representation Learning with Capsule Autoencoders

Neural Information Processing Systems

We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance.


Appendix for Unsupervised Motion Representation Learning with Capsule Autoencoders

Neural Information Processing Systems

We show in the table below the notations grouped by the modules. The values used in our implementation are shown if applicable. The necessity of a two-layer hierarchy is briefly discussed in Section 3.3. In short, it is difficult for a single-layer hierarchy to capture long-time dependencies and variations. This section describes an empirical study where we compare MCAE with its single-layer correspondence.


Unsupervised Motion Representation Learning with Capsule Autoencoders

Neural Information Processing Systems

We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance. In the lower level, a spatio-temporal motion signal is divided into short, local, and semantic-agnostic snippets. In the higher level, the snippets are aggregated to form full-length semantic-aware segments. For both levels, we represent motion with a set of learned transformation invariant templates and the corresponding geometric transformations by using capsule autoencoders of a novel design. This leads to a robust and efficient encoding of viewpoint changes.


Unsupervised Motion Representation Learning with Capsule Autoencoders

Neural Information Processing Systems

We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance. In the lower level, a spatio-temporal motion signal is divided into short, local, and semantic-agnostic snippets. In the higher level, the snippets are aggregated to form full-length semantic-aware segments. For both levels, we represent motion with a set of learned transformation invariant templates and the corresponding geometric transformations by using capsule autoencoders of a novel design. This leads to a robust and efficient encoding of viewpoint changes.